Diversity calculations ********************** *Alpha diversity* refers to diversity of OTUs/ASVs within a sample. *Beta diversity* refers to differences in community composition between different sampes. We have three ways of measuring diversity: - *naive* diversity considers all OTUs/ASVs to be equally differerent from each (this is also called taxonomic diversity); - *phylogenetic* diversity takes evolutionary distances between OTUs/ASVs in a phylogenetic tree into account; - *functional* diversity takes pairwise distances between OTUs/ASVs in a distance matrix into account. In the Hill-based framework used to calculate the naive-, phylogenetic-, and functional diversity measures, the importance of the relative abundance of OTUs/ASVs can be tuned by the diversity order (q). - A q of 0 means that relative abundance is not considered and OTUs/ASVs are treated as either present or absent in the samples. - A q of 1 means that that each OTU/ASV is weighted exactly according to its relative abundance in the sample. - A q > 1 means that more weight is given to OTUs/ASVs with high relative abundance. Alpha diversity ############### .. code-block:: python diversity.naive_alpha(tab, q=1) Returns taxonomic (or naive) alpha diversity values (i.e. Hill numbers) for all samples. *tab* is a frequency table holding the read counts, typically obj['tab'] *q* is the diversity order. .. code-block:: python diversity.phyl_alpha(tab, tree, q=1, index='PD'): Returns alpha diversity values that takes phylogenetic tree distances between sequences into account. *tab* is a count table holding the read counts, typically obj['tab'] *tree* is the tree dataframe, typically obj['tree']. *q* is the diversity order. *index* can be: 'PD', 'D', or 'H'. 'PD' is the phylogenetic diversity, 'D' is the Hill diversity number corrected for phylogenetic relationships, and 'H' is the phylogenetic entropy. See Chao et al. (2010). *Phil. Trans. R. Soc. B,* 365, 3599-3609 (DOI: 10.1098/rstb.2010.0272). .. code-block:: python diversity.func_alpha(tab, distmat, q=0, index='FD'): Returns alpha diversity values that takes the pairwise distances from between sequences into account. The values are the same as the functional diversity measures described by Chiu et al. (2014), *PLOS ONE,* 9(7), e100014 (DOI: 10.1371/journal.pone.0100014). *tab* is a count table holding the read counts, typically obj['tab'] *distmat* is a dataframe with pairwise distances between the sequences of OTUs/SVs. The file can be generated wit the stats.sequence_comparison() function. *q* is the diversity order for the Hill numbers. *index* can be: 'FD', 'MD', or 'D'. 'FD' is the functional diversity ("the effective total distance between species"), 'MD' is the mean functional diversity ("the effective sum of pairwise distances between a species and all other species"), and 'D' is the functional Hill number ("the effective number of equally abundant and equally distinct species"). See Chiu et al. (2014), *PLOS ONE,* 9(7), e100014 (DOI: 10.1371/journal.pone.0100014), the descriptions in quotations marks are from that paper. Beta diversity (dissimilarity) ############################## The difference in community composition between different samples can be expressed as *beta diversity* or *dissimilarity*. Beta diversity takes a value between 1 (if the compared samples are identical to each) and N (which is the number of samples being compared. For functional beta diversity it goes between 1 and NxN. Dissimilarity takes a value between 0 (if the compared samples are identical) and 1 (if the compared samples are completely different). Dissimilarity can be calculated with a *local* or *regional* viewpoint. The local viewpoint quantifies the effective proportion of OTUs/ASVs in a sample that is not shared across all samples. The regional viewpoint quantifies the effective proportion of OTUs/ASVs in the pooled samples that is not shared across all samples. See Chao and Chiu (2016), *Methods in Ecology and Evolution,* 7, 919-928. .. code-block:: python diversity.naive_beta(tab, q=1, dis=True, viewpoint='local'): Returns a dataframe of pairwise dissimilarities between samples. Taxonomic (or naive) Hill-based dissimilarities of order q are calculated. *tab* is a frequency table holding the read counts, typically object['tab'] *q* is the diversity order. If *dis* =True, dissimilarities constrained between 0 and 1 are calculated, if *dis* =False, beta diversity constrained between 1 and 2 are calculated. *viewpoint* can ‘local’ or ‘regional’, the difference is described in Chao et al. (2014), *Annual Review of Ecology, Evolution, and Systematics,* 45, 297-324. .. code-block:: python diversity.phyl_beta(tab, tree, q=1, dis=True, viewpoint='local'): Returns a dataframe of pairwise dissimilarities, which take a phylogenetic tree into account. The values are the phylogenetic beta diversity measures described by Chiu et al. (2014), *Ecological Monographs,* 84(1), 21-44. *tab* is a frequency table holding the read counts, typically obj['tab'] *tree* is a dataframe with phylogenetic tree information, typically obj['tree']. *q* is the diversity order. If *dis* =True, dissimilarities constrained between 0 and 1 are calculated, if *dis* =False, beta diversity constrained between 1 and 2 are calculated. *viewpoint* can ‘local’ or ‘regional’. .. code-block:: python diversity.func_beta(tab, distmat, q=1, dis=True, viewpoint='local'): Returns a dataframe of pairwise dissimilarities, which take pairwise sequence distances into account. The values are the functional beta diversity measures described by Chiu et al. (2014), *PLOS ONE,* 9(7), e100014. *tab* is a frequency table holding the read counts, typically obj['tab']. *distmat* is a dataframe with pairwise distances between the sequences of OTUs/ASVs. The file can be generated with the stats.sequence_comparison() function. *q* is the diversity order. If *dis* =True, dissimilarities constrained between 0 and 1 are calculated, if *dis* =False, beta diversity constrained between 1 and 4 are calculated. *viewpoint* can ‘local’ or ‘regional’. .. code-block:: python diversity.jaccard(tab) Returns a dataframe of pairwise Jaccard dissimilarities between samples. *tab* is a count table holding the read counts, typically object['tab'] Note that Jaccard dissimilarity is the same as naive_beta(tab, q=0, dis=True, viewpoint='regional') .. code-block:: python diversity.bray(tab) Returns a dataframe of pairwise Bray-Curtis dissimilarities between samples. *tab* is a count table holding the read counts, typically obj['tab'] .. code-block:: python diversity.naive_multi_beta(obj, var='None', q=1) Returns a pandas dataframe containing taxonomic (naive) beta diversity, and local- and regional dissimilarity values for categories of samples. *obj* is the qdiv object. *var* is the column heading in the meta data used to categorize the samples. If a category has two or more samples, beta diversity for that category is calculated. *q* is the diversity order. .. code-block:: python diversity.phyl_multi_beta(obj, var='None', q=1) Returns a pandas dataframe containing phylogenetic beta diversity, and local- and regional dissimilarity values for categories of samples. *obj* is the qdiv object. *var* is the column heading in the meta data used to categorize the samples. If a category has two or more samples, beta diversity for that category is calculated. *q* is the diversity order. .. code-block:: python diversity.func_multi_beta(obj, distmat, var='None', q=1) Returns a pandas dataframe containing phylogenetic beta diversity, and local- and regional dissimilarity values for categories of samples. *obj* is the qdiv object. *distmat* is a distance matrix with pairwise distances between OTUs/ASVs, typically generated by the stats.sequence_comparison() function. *var* is the column heading in the meta data used to categorize the samples. If a category has two or more samples, beta diversity for that category is calculated. *q* is the diversity order. .. code-block:: python diversity.dissimilarity_contributions(obj, var='None', q=1, divType='naive', index='local') Returns a pandas dataframe with information about dissimilarity, number of samples, and the percentage contribution of each OTU/ASV to the observed dissimilarity. *obj* is the qdiv object. *var* is the column heading in the meta data used to categorize the samples. If a category has two or more samples, dissimilarity for that category is calculated. *q* is the diversity order. *divType* is the type of diversity: 'naive' or 'phyl'. *index* is the type of dissimilarity index (either local or regional) Evenness ######## .. code-block:: python diversity.evenness(tab, tree='None', distmat='None', q=1, divType='naive', index='local', perspective='samples') Returns evenness calculated according to Chao and Ricotta (2019) *Ecology,* 100(12), e02852. If q is 1, divType is 'naive', and index is either 'CR1', 'CR2', 'local' or 'regional', the evenness is identical to Pielou's evenness. *tab* is a count table, typically obj['tab'] *tree* would typically be obj['tree']. It must be specified only if divType='phyl'. *distmat* is a distance matrix. It must be specified only if divType='func'. *q* is diversity order. *divType* is 'naive', 'phyl', or 'func'. *index* can be 'CR1', 'CR2', 'CR3', 'CR4', or 'CR5'. These refer to the indices described in Table 1 in Chao and Ricotta (2019). *index* can also be 'local', which is equal to CR2, or 'regional', which is equal to CR1. *perspective* can be 'samples', which means an evenness value is calculated for each sample (column) in the count table. *perspective* can also be 'taxa', which means an evenness value is calculated for each OTU/ASV (row) in the count table.